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DECLARATION OF DR. KAUSHIK SAHA AND MR. SRIIIB NARAYAN MAITI 



Commissioner for Patents: 

We, Dr. Kaushik Saha and Mr. Srijib Narayan Maiti, hereby declare that: 

1 . Dr. Kaushik Saha is a co-inventor of the above-referenced patent 
application ("the present application") and is a researcher and electrical engineer employed by 
STMicroelectronics Asia Pacific Pvt. Ltd., the assignee of the present application. Dr. Saha has 
over 20 years of experience in the field of designing signal processors and embedded signal 
processing systems, and is considered an expert in the field of signal processing. A copy of Dr. 
Saha's Curriculum Vitae is attached hereto as Appendix 1. 

2. Mr. Srijib Narayan Maiti is a co-inventor of the present application and is 
a researcher and electrical engineer employed by STMicroelectronics Asia Pacific Pvt. Ltd., the 
assignee of the present application. Mr. Maiti has over 12 years of experience in the field of 
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designing signal processors and embedded signal processing systems, and is considered an 
expert in the field of signal processing. A copy of Mr. Maiti's resume is attached hereto as 
Appendix 2. 

3. We have reviewed the present application, the Office Action mailed 
February 1, 2010, including the objections to the drawings and the rejections under 35 U.S.C. § 
112, first paragraph, for lack of enablement, and under 35 U.S.C. § 103(a) for obviousness, and 
the references cited therein, and the objection to the Amendment of November 12, 2009, as 
allegedly introducing new matter. We submit this Declaration to the Examiner in support of the 
patentability of the presently claimed invention, which is generally directed to methods of 
computing Fast Fourier Transforms (FFT) or Inverse Fast Fourier Transforms (IFFT) of digital 
signals in multiprocessor systems. 

Objections to the Drawings 

4. We understand the Examiner objected to the drawings for failing to show 
the "structure of computing/performing N-point FFT/IFFT of the signal using first and second 
stages wherein the second stage employs single, un-nested computational loop." We understand 
that none of the claims uses the exact language which the Examiner contends is not shown in the 
drawings. Accordingly, we discuss the independent claims with reference to supporting figures 
below. 

5. We understand that independent claim 1, as amended, recites, "[a] method 
of processing a digital signal by computing a Fast Fourier Transform (FFT) or Inverse Fast 
Fourier transform (IFFT) of the digital signal, the method comprising using a multiprocessor 
computing system having a plurality of processors P configured to perform the steps of: 
computing an N-point FFT/IFFT of the signal using first and second sets of butterfly 
computational stages, each stage in the second set of stages employing a plurality of butterfly 
operations, wherein each of the butterfly operations in each stage in the second set of stages has a 
single, un-nested computation loop; and distributing the plurality of butterfly operations in each 
stage of the second set of stages such that each processor computes an equal number of complete 
butterfly operations thereby eliminating data interdependency among the parallel processors." 
To the extent the Examiner is referring to the multiprocessor system, an embodiment of such a 
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system is show in Figure 4, where a multiprocessing system comprises clusters 12 (i.e., 
processors, see Specification at 7, line 24 to 8, line 25). Figure 5 contains more detail on an 
embodiment of a cluster of Figure 4. To the extent the Examiner is referring to the recited 
method steps, Figures 2 and 3 illustrate implementations of embodiments of the method. Figure 
2 shows a two-processor implementation and Figure 3 shows a four processor implementation, 
where different line styles represent computation in each of the processors. See Page 6, line 25 
to 7, line 22 of the specification as filed. 

6. We understand that independent claim 3 recites, "[a] multiprocessor 
system to process a digital signal by computing a Fast Fourier Transform (FFT) or Inverse Fast 
Fourier transform (IFFT) of the signal using a decimation in time or decimation in frequency 
approach, comprising: the multiprocessor system having a plurality of processors P and 
configured to implement: means for computing a first plurality of log2P stages of an N-point 
FFT/IFFT of the signal; means for computing a second plurality of stages of the N-point 
FFT/IFFT of the signal using in each stage of the second plurality of stages a plurality of 
butterfly operations, wherein each butterfly operation employs a single butterfly computation 
loop without employing nested loops; and means for distributing the butterfly operations in each 
stage of the second plurality of stages such that each processor computes an equal number of 
complete butterfly operations thereby eliminating data interdependency in the stages of the 
second plurality of stages." To the extent the Examiner is referring to the multiprocessor system, 
an embodiment of such a system is show in Figure 4, where a multiprocessing system comprises 
clusters 12 (i.e., processors, see Specification at 7, line 24 to 8, line 25). Figure 5 contains more 
detail on an embodiment of a cluster of Figure 4. To the extent the Examiner is referring to the 
recited functions, Figures 2 and 3 illustrate embodiments of the implementation of the functions. 
Figure 2 shows a two-processor implementation and Figure 3 shows a four processor 
implementation, where different line styles represent computation in each of the processors. See 
Page 6, line 25 to 7, line 22. 

7. We understand that independent claim 5, as amended, recites, "[a] non- 
transitory computer- readable storage medium whose contents cause a system having a plurality 
of processors to perform a linear scalable method of transforming a signal by computing with the 
plurality of processors a Fast Fourier Transform (FFT) or an Inverse Fast Fourier Transform 
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(IFFT) of the signal, the method comprising: computing a first plurality of stages of an N-point 
FFT/IFFT; and computing a second plurality of stages of the N-point FFT without employing 
nested loops and by distributing the butterfly operations in each stage of the second plurality of 
stages such that each processor computes an equal number of complete butterfly operations 
thereby eliminating data interdependency in the stage." To the extent the Examiner is referring 
to a system having a plurality of processors, an embodiment of such a system is show in Figure 
4, where a multiprocessing system comprises clusters 12 (i.e., processors, see Specification at 7, 
line 24 to 8, line 25). Figure 5 contains more detail on an embodiment of a cluster of Figure 4. 
To the extent the Examiner is referring to the recited method, Figures 2 and 3 illustrate the 
implementation of embodiments of the method. Figure 2 shows a two-processor implementation 
and Figure 3 shows a four processor implementation, where different line styles represent 
computation in each of the processors. See Page 6, line 25 to 7, line 22. 

8. We understand that independent claim 16, as amended, recites, "[a] non- 
transitory computer-readable storage medium whose contents cause a system having a plurality 
of processors to perform a linear scalable method of transforming a signal, the method 
comprising: computing an N-point FFT/IFFT using a first plurality of butterfly computational 
stages and a second plurality of butterfly computational stages, each stage in the second plurality 
of stages employing a plurality of butterfly operations having a single, un-nested computation 
loop; and distributing the plurality of butterfly operations in each stage of the second plurality of 
stages such that each processor computes an equal number of complete butterfly operations 
thereby eliminating data interdependency in the stage." To the extent the Examiner is referring 
to a system having a plurality of processors, an embodiment of such a system is show in Figure 
4, where a multiprocessing system comprises clusters 12 (i.e., processors, see Specification at 7, 
line 24 to 8, line 25). Figure 5 contains more detail on an embodiment of a cluster of Figure 4. 
To the extent the Examiner is referring to the recited method, Figures 2 and 3 illustrate the 
implementation of embodiments of the method. Figure 2 shows a two-processor implementation 
and Figure 3 shows a four processor implementation, where different line styles represent 
computation in each of the processors. See Page 6, line 25 to 7, line 22. 

9. We understand that independent claim 27, as amended, recites, "[a] 
method of transforming a digital signal, the method comprising: using a multiprocessor 
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computing system having a plurality P of processors configured to: compute a first number of 
butterfly stages of an N-point Fast Fourier Transform (FFT) or Inverse Fast Fourier transform 
(IFFT); and compute remaining butterfly stages of the N-point FFT/IFFT with a single iterative 
loop wherein each processor computes an equal number of butterfly operations and there is no 
data dependency between butterflies in a stage of an iteration of the loop." To the extent the 
Examiner is referring to a multiprocessor computing system having a plurality of processors, an 
embodiment of such a system is show in Figure 4, where a multiprocessing system comprises 
clusters 12 {i.e., processors, see Specification at 7, line 24 to 8, line 25). Figure 5 contains more 
detail on an embodiment of a cluster of Figure 4. To the extent the Examiner is referring to the 
recited method, Figures 2 and 3 illustrate the implementation of embodiments of the method. 
Figure 2 shows a two-processor implementation and Figure 3 shows a four processor 
implementation, where different line styles represent computation in each of the processors. See 
Page 6, line 25 to 7, line 22. 

10. We understand that independent claim 31 recites, "[a] system, comprising: 
an instruction fetch cache; and a plurality of processors P coupled to the instruction fetch catch 
and configured to: compute a first number of butterfly stages of an N-point Fast Fourier 
Transform (FFT) or Inverse Fast Fourier Transform (IFFT) of a digital signal; and compute 
remaining butterfly stages of the N-point FFT/IFFT with a single iterative loop wherein there is 
no data dependency between butterflies in a stage of an iteration of the loop." To the extent the 
Examiner is referring to a system comprising an instruction fetch cache and a plurality of 
processors, an embodiment of such a system is show in Figure 4, where a multiprocessing 
system comprises clusters 12 (i.e., processors, see Specification at 7, line 24 to 8, line 25). 
Figure 5 contains more detail on an embodiment of a cluster of Figure 4. To the extent the 
Examiner is referring to the configuration of the plurality of processors, Figures 2-5 illustrate the 
implementation of example embodiments. Figure 2 shows a two-processor implementation, 
Figure 3 shows a four processor implementation, where different line styles represent 
computation in each of the processors. Figure 4 shows an N processor implementation. See 
Page 6, line 25 to 7, line 22. 
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The Claims Are Enabled by the Specification 

11. We understand the Examiner rejected claims 1-7, 1 1-13, 15-18, 27-29 and 
31-33 under 35 USC Section 112, first paragraph, as not enabled. We understand the Examiner 
cited claim 1 as an example and contends that "first and second sets of butterfly computational 
stages ... wherein the second stage employs a single un-nested computational loop," was never 
fully addressed in the summary or the detail in such a way that one of skill in the art would be 
able to make or use the invention. We disagree. 

12. We are familiar with the level of skill in the art of digital signal 
processing. After reviewing the specification, one of skill in the art would understand what was 
meant by an un-nested loop in the context of the claims. As an initial matter, one of skill in the 
art would understand the difference between nested and un-nested loops in general. Moreover, 
Figures 2 and 3 show multiple stages of an N-point FFT/IFFT implemented on multiprocessor 
systems, and the description thereof (see pages 6-8) as well as original claim 2 describe how to 
implement at least one embodiment of distributing the remaining stages of the butterfly 
operations among the processors in an un-nested loop (for example, "by assigning to each 
processor of the multi-processor system respective addresses of memory locations corresponding 
to inputs and outputs required for each specific butterfly operation assigned to the processor"). 
See original claim 2. This is sufficient to enable one of skill in the art to practice the claimed 
invention without undue experimentation, which we understand is all that is required to satisfy 
the enablement requirement. We believe the disclosure is sufficient to enable the remaining 
independent claims for similar reasons. For convenience, Figures 2 and 3 of the present 
application are reproduced below, along with portions of the description thereof on pages 6-7. 
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Figure 3 of Present Application 



Figure 2 shows the implementation for an 8-point FFT in a 2-processor 
architecture using the present invention. Dotted lines are computed in one 
processor, and dashed lines in the other. The computational blocks are 
represented by *0\ The left side of each computational block is its input (the time 
domain samples) while the right side is its output (transformed samples). The 
present invention uses a mixed radix approach with decimation in time. The first 
two stages of the radix-2 FFT/IFFT are computed as a single radix-4 stage. As 
these stages contain only load/stores and add/subtract operations there is no need 
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for multiplication. This leads to reduced time for FFT/IFFT computation as 
compared to that with full radix-2 implementation. The next stage has been 
implemented as a radix-2. The three main nested loops of conventional 
implementations have been fused into a single loop which iterates (N/2*(log2 N- 
2))/(number of processor) times. Each processor is used to compute one butterfly 
in one loop iteration. Since there is no data dependency between different 
butterflies in this algorithm, the computational load can be linearly divided among 
the different processors, leading to the linear scalability. 

The mechanism for assigning the butterflies in this manner consists of assigning 
the memory location to a processor such that each processor computes a complete 
butterfly. To achieve this a binary digit is inserted at the appropriate bit location 
in the address of the memory location for input/output data for the computation of 
the butterfly, depending on the stage of the FFT transformation. 

Figure 3 shows a 4-processor implementation for the 8-point FFT using this 
invention. Different line styles represent computation in each of the 4 processors. 

Present Application, Pages 6-7. 

Abel, Alone or in Combination with Jaber. Does Not Render the Claims Obvious 

13. We understand the Examiner rejected claims 1-7, 11-13, 15-18, 27-29 and 
31-33 under 35 USC Section 103(a) as obvious over U.S. Patent No. 5,991,787 issued to Abel et 
al., in view of U.S. Patent No. 6,792,441 issued to Jaber. For the reasons set forth below, we do 
not agree that the combination of Abel and Jaber would render the claimed embodiments obvious 
to one of skill in the art. 

14. The Examiner appears to rely on Jaber as allegedly teaching the 
distributing of the butterfly operations. The operation of Jaber for a 16-point FFT distributed 
among 4 processors is described below with reference to the some of the figures of Jaber, and is 
contrasted with the operation of embodiments of the present application. This discussion is 
illustrative purposes only, and can be generalized to larger FFTs (size N), as well as to different 
numbers of processors. 
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Fig 10 of US6792441 Qaber e t al) 



15. With reference to Figure 10 of Jaber, Jaber teaches that the set of input 
data points (N=16) is divided among 4 processors (P=4) such that each processor works on 4 
data points (i.e. executes 2 butterfly computations) in each stage of an FFT. All 4 processors 
execute this task in parallel without the need to send or receive data to or from the other parallel 
processor, for the first two stages of FFT. This is followed by a combination phase which needs 
the outputs (4x4=16) of all these processors and produces 16 outputs finally. The combination 
phase is comprised of the last 2 (log2P) stages of the FFT. 
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16. As a result butterfly computations involving input data points are assigned 
to the processors of Jaber as follows: 

{fO, f4, f8, fl2} to first processor ^ 

{fl, f5, f9, fl3} to second processor 

{f2, fl6, flO, 04} to third processor Gl 

{f3, f7, fll, fl5} to fourth processor ^ 

17. Until stage 2 of the FFT computation, all 4 processors execute their 
butterfly operations in parallel and independently of each other. Thereafter, stages 3 and 4 of the 
FFT computation are assigned to a combination phase. For the case of a 16-point FFT, the 
computations require 8 (N/2) twiddle factors/coefficients named, {WO, W7}. The first two 
stages require the coefficients WO and Wl only. The combination phase requires the entire set 
of coefficients {W0,...,W7}. This necessitates that the coefficients WO, Wl be accessible to all 
the 4 processors at the same time instant. This is evident from Fig 8 of Jaber, reproduced below, 
which shows a coefficient memory (804) being accessed by all the processors 807 A,. . .807B. 

JL 




Figure 8 of Jaber 
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18. If we consider as a second example the case of a 32 point FFT (N=32) on 
4 parallel processors (P=4), input data points are assigned to the processors as follows: 

{fO, f4, f8, fl2, fl6, f20, f24, f28} to first processor ^ 
{fl, f5, f9, fl3, fl7, f21, f25, f29} to second processor 
{f2, f6, flO, fl4, fl8, f22, f26, f30} to third processor 

{f3, f7, fll, fl5, fl9, f23, f27, f31} to fourth processor 

19. There are total 5 stages in this case. Until stage 3 of the FFT computation, 
all 4 processors execute their butterfly operations in parallel and independently of each other. 
Thereafter, stages 4 and 5 of the FFT computation are assigned to a combination phase. In this 
case too, the combination phase is comprised of the last log2P = 2 stages of the 32 point FFT. 

20. In this case there are 16 twiddle coefficients {W0,...,W15}, out of which 
{W0,W1,W2,W3} are required to be accessible to all the 4 processors. The combination phase 
requires the entire set of coefficients {W0,...,W15}. 

21. In the general case of an N point FFT being executed on P parallel 
processors, there would be a total of log2N stages, out of which the last log2P stages comprise 
the combination phase and the remaining (log2N- log2P) stages are computed in P processors in 
parallel. In the general case, there would be N/2 twiddle coefficients in all {W0,...,Wn/2-1}» out 
of which the identical N/(2P) number of twiddle coefficients need to be accessible to all the P 
parallel processors all the while. The combination phase requires access to all the N/2 twiddle 
coefficients. 
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Figure 8 of Jaber Modified to Show Distribution of Twiddle Coefficients in Jaber 

22. Figure 8 of Jaber appears again above, modified to show the distribution 
of the twiddle coefficients from the coefficient memory as per the Jaber. Out of N/2 coefficients, 
N/2P coefficients are needed by all of the processors all of the time, and the entire set is needed 
by the combinational phase. 
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Butterfly Distribution of Present Disclosure for 4-Processor Configuration: N = 16; P = 4 



23. The butterfly distribution of the disclosure of the present application is 
illustrated above for N = 16 and P = 4. The set of input data points (N=16) is divided among 4 
processors (P=4) such that each processor works on 4 data points (i.e. executes 2 butterfly 
computations) in each stage of the FFT. (A 16 point FFT is shown to facilitate a comparison 
with Jaber; Figure 3 of the present application shows an 8 point FFT on a 4 processor system). 
All 4 processors execute this task in parallel without the need to send or receive data to or from 
the other parallel processors, for the last two stages of the FFT. During the first log2P =2, there 
is data dependency between the parallel processors, i.e. there is need of data to/from each other 
in the first two stages. Therefore, parallel processing of butterfly computations starts after the 
first 2 stages. As shown in the above figure (stage 3, stage 4), the butterflies colored in red are 
computed in the first processor, the butterflies colored in blue are computed in the second 
processor, the butterflies colored in green are computed in the third processor, the butterflies 
colored in black are computed in the fourth processor. It can be seen from stage 4 that all the red 
colored butterflies take input data points only from red colored butterflies of the previous stage 
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(i.e. stage 3), and so forth for the other processors. As a result butterfly computations involving 
input data points are assigned to the processors as follows: 

{x(0),x(8),x(l),x(9)} to first processor 
{x(4), x(12), x(5), x(13)} to second processor 
{x(2), x(10), x(3),x (11)} to third processor 
{x(6) ,x(14), x(7), x(15)} to fourth processor 

24. After stage 2 of the FFT computation, all 4 processor execute their 
butterfly operations in parallel and independently of each other until the final output. Prior to 
that, stages 1 and 2 of the FFT computation have data dependency among the parallel processors. 
For the case of a 16-point FFT, the computations require 8 (=N/2) twiddle factors/coefficients 
{W 0 ,. . .,W 7 }. As shown in the butterfly distribution illustrated on the previous page, the 
requirement of twiddle coefficients by the 4 processors is as follows: 

{w°, w 4 } to first processor 
{w 1 , w 5 } to second processor 
{w 2 , w 6 } to third processor 
{w 3 , w 7 } to fourth processor 

25. It may be noted that owing to the innovative scheme of distribution of 
butterfly computations among processors of the present application, the sets of twiddle 
coefficients required by different processors are disjoint and the number of coefficients in each 
set does not exceed 2. 
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26. If we consider as a second example the case of a 32 point FFT (N=32) on 
4 parallel processors (P=4), input data points are assigned to the processors as follows: 

{x(0),x(16),x(l),x(17), x(2),x(18),x(3),x(19)} to first processor — 
{x(8),x(24),x(9),x(25), x(10),x(26),x(ll),x(27)} to second processor 
{x(4),x(20),x(5),x(21),x(6),x(22),x(7),x(23)} to third processor |" G5 

{x(12),x(28),x(13),x(29), x(14),x(30),x(15),x(31)} to fourth processor^ 



27. After stage 2 of the FFT computation, all 4 processor execute their 
butterfly operations in parallel and independently of each other until the final output. Prior to 
that, stages 1 and 2 of the FFT computation have data dependency among the parallel processors. 
For the case of a 32-point FFT, the computations require 16 (=N/2) twiddle factors/coefficients 
named {W°,...,W 15 }. The requirement of twiddle coefficients by the 4 processors is as follows: 



{w , w , w , w ) to first processor 
{w\ w 5 , w 9 , w 13 } to second processor 
{w 2 , w 6 , w 10 , w 14 } to third processor 
{w 3 , w 7 ,w u , w 15 } to fourth processor 
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28. It may be noted again that owing to the innovative scheme of distribution 
butterfly computations among processors of the present application, the sets of twiddle 
coefficients required by different processors are disjoint and the number of coefficients in each 
set does not exceed 4. 

29. In the general case of an N point FFT being executed on P parallel 
processors, there would be a total of log2N stages, out of which the first log2P stages have data 
dependency between the parallel processors and the remaining (log2N- log 2 P) stages are 
computed in P processors in parallel without need of data to/from each other. In the general 
case, there would be N/2 twiddle coefficients in all {W°,...,W N/21 }, out of which disjoint sets 
N/(2P) of twiddle coefficients are needed by each of the P parallel processors except when P=2. 
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In this case also (P=2), one processor only needs all the twiddle coefficients, whereas the other 
one needs only a subset having N/(2P) twiddle coefficients. 

30. The fundamental difference lies in the method of butterfly distribution 
among the parallel processor as shown in the assignments of butterflies among the parallel 
processors, Gl, G3 for 16-point FFT/IFFT and G2, G5 for 32-point FFT/IFFT. As a result of 
this, the stages where there are dependencies among the parallel processors are in the first log2P 
stages as per the present disclosure, whereas, the dependencies among the parallel processors are 
in the last log2P stages as per Jaber. 

31. Another manifestation of the disclosure of the present application can be 
seen in the uses of twiddle coefficients in the different parallel processors. As per the present 
proposal, it may be noted that although all N/2 coefficients are used for the entire FFT 
computation, no parallel processor uses more than an N/2P subset of these, having no common 
coefficients with any other processors except when P=2. Whereas as per Jaber, an identical 
N/(2P) twiddle coefficients need to be accessible to all the P parallel processors all the while. 
The combination phase of Jaber requires access to all the N/2 twiddle coefficients. Figure 8 of 
Jaber is modified again on the following page to show how the twiddle coefficients would be 
applied if Jaber were modified in accordance with the disclosure of the present application. As 
can be seen, distinct N/2P subsets of the N/2 coefficients in the coefficient memory are used by 
different parallel processors. At any stage no processor has need of any coefficients which are 
required by any other processor. Please note that the present disclose is not limited in any way to 
application to the particular architecture of Jaber. It can be applicable to any system architecture 
in general, including shared and distributed memory systems. The use of shared coefficient 
memory taught in Jaber is rendered unnecessary in the present proposal due to the innovative 
method of distribution of butterflies among the parallel processors. 
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Figure 8 of Jaber Modified to Show Distribution of Twiddle Coefficients 
According to Present Disclosure 

32. Turning to the language of the claims, we understand claim 1 as amended 
recites, "each stage in the second set of stages has a single, un-nested computation loop; and 
distributing the plurality of butterfly operations in each stage of the second set of stages such that 
each processor computes an equal number of complete butterfly operations thereby eliminating 
data interdependency among the parallel processors." Abel, alone or in combination with Jaber, 
does not teach or suggest, or otherwise render obvious to one of skill in the art, "distributing the 
plurality of butterfly operations in each stage of the second set of stages such that each processor 
computes an equal number of complete butterfly operations thereby eliminating data 
interdependency among the parallel processors," as recited. We understand that claims 2, 7 and 
1 1 depend from claim 1 and are allowable at least by virtue of their dependencies. 
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33. We understand that the Examiner contends that because Jaber stores the 
coefficients in memory and retrieves the coefficients from memory there is no data dependency. 
This argument is incorrect because in Jaber, the coefficients are shared between the processors, 
as discussed above. Specifically, out of N/2 coefficients, N/2P coefficients are needed by all of 
the processors of Jaber all of the time, and the entire set is needed by the combinational phase. 
Thus, the stages of Jaber have data interdependencies among the processors. 

34. We understand that independent claim 3 recites, "means for computing a 
second plurality of stages of the N-point FFT/IFFT of the signal using in each stage of the 
second plurality of stages a plurality of butterfly operations, wherein each butterfly operation 
employs a single butterfly computation loop without employing nested loops; and means for 
distributing the butterfly operations in each stage of the second plurality of stages such that each 
processor computes an equal number of complete butterfly operations thereby eliminating data 
interdependency in the stages of the second plurality of stages." Abel, alone or in combination 
with Jaber, does not teach or suggest, or otherwise render obvious to one of skill in the art the 
recited, "means for distributing the butterfly operations in each stage of the second plurality of 
stages such that each processor computes an equal number of complete butterfly operations 
thereby eliminating data interdependency in the stages of the second plurality of stages." As 
discussed above with respect to claim 1, the stages of Jaber have data interdependencies among 
the processors. We understand that claims 4, 12, 13 and 15 depend from claim 3 and are 
allowable at least by virtue of their dependencies. 

35. We understand that independent claim 5, as amended, recites, "computing 
a first plurality of stages of an N-point FFT/IFFT; and computing a second plurality of stages of 
the N-point FFT without employing nested loops and by distributing the butterfly operations in 
each stage of the second plurality of stages such that each processor computes an equal number 
of complete butterfly operations thereby eliminating data interdependency in the stage." Abel, 
alone or in combination with Jaber, does not teach or suggest, or otherwise render obvious to one 
of skill in the art, "computing a second plurality of stages of the N-point FFT without employing 
nested loops and by distributing the butterfly operations in each stage of the second plurality of 
stages such that each processor computes an equal number of complete butterfly operations 
thereby eliminating data interdependency in the stage," as recited. As discussed above, the 
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stages of Jaber have data interdependencies among the processors. We understand that claim 6 
depends from claim 5 and is allowable at least by virtue of its dependencies. 

36. We understand that independent claim 16, as amended, recites, 
"computing an N-point FFT/IFFT using a first plurality of butterfly computational stages and a 
second plurality of butterfly computational stages, each stage in the second plurality of stages 
employing a plurality of butterfly operations having a single, un-nested computation loop; and 
distributing the plurality of butterfly operations in each stage of the second plurality of stages 
such that each processor computes an equal number of complete butterfly operations thereby 
eliminating data interdependency in the stage." Abel, alone or in combination with Jaber, does 
not teach, suggest or otherwise render obvious to one of skill in the art, "distributing the plurality 
of butterfly operations in each stage of the second plurality of stages such that each processor 
computes an equal number of complete butterfly operations thereby eliminating data 
interdependency in the stage," as recited. As discussed above, the stages of Jaber have data 
interdependencies among the processors. We understand that claims 17 and 18 depend from 
claim 16 and are allowable at least by virtue of their dependencies. 

37. We understand that independent claim 27, as amended, recites, "compute 
a first number of butterfly stages of an N-point Fast Fourier Transform (FFT) or Inverse Fast 
Fourier transform (IFFT); and compute remaining butterfly stages of the N-point FFT/IFFT with 
a single iterative loop wherein each processor computes an equal number of butterfly operations 
and there is no data dependency between butterflies in a stage of an iteration of the loop." Abel, 
alone or in combination with Jaber, does not teach, suggest or otherwise render obvious to one of 
skill in the art, "compute remaining butterfly stages of the N-point FFT/IFFT with a single 
iterative loop wherein each processor computes an equal number of butterfly operations and 
there is no data dependency between butterflies in a stage of an iteration of the loop," as recited, 
as discussed above with respect to claim 1, the stages of Jaber have data interdependencies 
among the processors. We understand that claims 28 and 29 are allowable at least by virtue of 
their dependencies. 

38. We understand that independent claim 31, as amended, recites, "compute 
a first number of butterfly stages of an N-point Fast Fourier Transform (FFT) or Inverse Fast 
Fourier Transform (IFFT) of a digital signal; and compute remaining butterfly stages of the N- 
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point FFT/IFFT with a single iterative loop wherein there is no data dependency between 
butterflies in a stage of an iteration of the loop." Abel, alone or in combination with Jaber, does 
not teach, suggest or otherwise render obvious to one of skill in the art, "compute remaining 
butterfly stages of the N-point FFT/IFFT with a single iterative loop wherein there is no data 
dependency between butterflies in a stage of an iteration of the loop," as recited. As discussed 
above, the stages of Jaber have data interdependencies among the processors. We understand 
that claims 32 and 33 are allowable at least by virtue of their dependencies. 



No New Matter Has Been Introduced 

39. We understand the Examiner also objected to the amendment filed with 
the RCE of November 1 1 , 2009 as introducing new matter. Specifically, the Examiner contends 
each of the independent claims introduces new matter based on the limitations "first and second 
sets of butterfly computational stages . . . wherein the second stage employs single un-nested 
computational loop." 

40. We understand that the current independent claims do not contain the 
precise language objected to by the Examiner. Nevertheless, we have attempted to address the 
Examiner's concerns below. 

41. We understand that independent claim 1, as originally filed, recited: 

1 . A linear scalable method for computing a Fast Fourier Transform (FFT) or 
Inverse Fast Fourier transform (IFFT) in a multiprocessing system using a 
decimation in time approach, comprising the steps of: 

computing first and second stages of log2N stages of an N-point FFT/IFFT 
as a single radix-4 butterfly operation while implementing the remaining (log2N- 
2) stages using radix-2 butterfly operations, wherein each radix-2 butterfly 
operation employs a single radix-2 butterfly computation loop without employing 
nested loops; and 

distributing the butterfly operations in each stage such that each processor 
computes an equal number of complete butterfly operations thereby eliminating 
data interdependency in the stage. 

42. To the extent the Examiner objects to the introduction of the word "sets" 
in claim 1 during prosecution, one of skill in the art would have recognized that claim 1 as 
originally filed was addressed to processing two sets of stages (the first set comprising the first 
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and second stages of butterfly computational stages and the second set comprising the remaining 
stages), with the second set (the remaining stages) employing "a single ... computational loop 
without employing nested loops." A "set" has a well-known meaning to one of skill in the art 
and one of skill in the art would have recognized that original claim 1 referred to sets of 
computational stages. Moreover, Figures 2 and 3 illustrates sets of computational stages and the 
description thereof on pages 6-7 is would have been recognized by one of skill in the art as 
directed to handling the first two stages in one manner and the remaining stages in another. 

43. To the extent the Examiner is referring to "a single un-nested 
computational loop" this is almost identical to the original language "a single ... computation 
loop without employing nested loops." Further, as discussed above, one of skill in the art would 
have recognized Figures 2 and 3 and the description thereon on pages 6 and 7 as disclosing 
"computing an N-point FFT/IFFT of the signal using first and second sets of butterfly 
computational stages, each stage in the second set of stages employing a plurality of butterfly 
operations, wherein each of the butterfly operations in each stage in the second set of stages has a 
single, un-nested computation loop." 

44. We understand that independent claim 3, as originally filed, recited: 

3. A linear scalable system for computing a Fast Fourier Transform (FFT) or 
Inverse Fast Fourier transform (IFFT) in a multiprocessing system using a 
decimation in time approach, comprising: 

means for computing first and second stages of log2N stages of an N-point 
FFT/IFFT as a single radix-4 butterfly operation while implementing the 
remaining (log2N-2) stages using radix-2 butterfly operations, wherein each radix- 
2 butterfly operation employs a single radix-2 butterfly computation loop without 
employing nested loops; and 

means for distributing the butterfly operations in each stage such that each 
processor computes an equal number of complete butterfly operations thereby 
eliminating data interdependency in the stage. 

45. Claim 3 as currently submitted does not use the word "sets." 

46. If it is "plurality of stages" to which the Examiner objects in claim 3, 
"plurality" has a definite meaning in the claims of a patent, specifically, more than one, and one 
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of skill in the art would have recognized both "first and second stages" as a first plurality of 
stages, and "remaining stages" as a second plurality of stages. 

47. If it is "employs a single . . . butterfly computational loop without 
employing nested loops," to which the Examiner objects, this language appears in claim 3 as 
originally filed. 

48. We understand that independent claim 5, as originally filed, recited: 

5. A computer program product comprising computer readable program code 
stored on a computer readable storage medium embodied therein for computing a 
Fast Fourier Transform (FFT) or Inverse Fast Fourier transform (IFFT) in a 
multiprocessing system using a decimation in time approach, comprising: 

computer readable program code means configured for computing 
computing first and second stages of log2N stages of an N-point FFT/IFFT as a 
single radix-4 butterfly operation while implementing the remaining (log2N-2) 
stages using radix-2 butterfly operations, wherein each radix-2 butterfly operation 
employs a single radix-2 butterfly computation loop without employing nested 
loops; and 

computer readable program code means configured for distributing the 
butterfly operations in each stage such that each processor computes an equal 
number of complete butterfly operations thereby eliminating data interdependency 
in the stage. 

49. Claim 5 as amended does not use the word "sets." 

50. If it is "plurality of stages" to which the Examiner objects, "plurality" has 
a definite meaning in the claims of a patent, specifically, more than one, and one of skill in the 
art would have recognized both "first and second stages" as a first plurality of stages, and 
"remaining stages" as a second plurality of stages. 

51. If it is "without employing nested loops," to which the Examiner objects, 
this language appears in claim 5 as originally filed. 

52. With regard to independent claim 16, to the extent the Examiner objects to 
the word "plurality," "plurality" has a definite meaning in the claims of a patent, specifically, 
more than one, and one of skill in the art would have recognized both "first and second stages" 
(see original claim 5) as a first plurality of stages, and "remaining stages" (see original claim 5) 
as a second plurality of stages. To the extent the Examiner objects to "having a single, un-nested 
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computational loop," please see original claims 1, 3 and 5, and the discussion of claims 1, 3 and 
5 above. 

53. With regard to independent claims 27 and 31, to the extent the Examiner 
objects to "a first number of butterfly stages" one of skill in the art would have recognized "first 
and second stages" (see original claim 5) as a first number of stages, and "remaining stages" (see 
original claim 5) as "remaining stages." To the extent the Examiner objects to "a sinjgle iterative 
loop," the concept of a single iterative loop is discussed on page 7, lines 8-13. 

54. We hereby declare that all statements made herein of our own knowledge 
are true and that all statements made on information and belief are believed to be true; and 
further that these statements were made with the knowledge that willful false statements and the 
like so made are punishable by fine or imprisonment^ or both, under Section 1001 of Title 18 of 
the United States Code and that such willful false statements may jeopardize the validity of the 
application or any patent issued thereon. 

Date ' Dr. Kaushik Saha 



Date „ Mr. Srijib Narayan Maiti 
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NAME: 



ULUM-VITAE OF KAUSHIK SAHA 

Dr. KAUSHIK SAHA 



ADDRESS FOR 
CORRESPONDENCE: 



A-2 Staff Flats, 
I. P. College, 
Shamnath Marg, 
Delhi- 110054 
India 

ph:+91-l 1-2392-9931 
cell: +91-98110-64398 
email: kaushik.saha@st.com 



DATE OF BIRTH: 1 4th April, 1 965 
EDUCATIONAL QUALIFICATIONS: 



DEGREE 
B.Tech. 

M.Tech. 



Ph.D. 



INSTITUTION 
Indian Institute Of 
Technology, Delhi 
Indian Institute Of 
Technology, Delhi 



Indian Institute Of 
Technology, Delhi 



DISCIPLINE YEAR 
Electrical 

Engineering 1983-87 

Computer 1988-89 

Technology 

(Dept. Of Elect. Engg.) 



Electrical 
Engineering 



1990-96 



EXPERIENCE: 



Research: 



1) 



2) 



M.Tech. dissertation work was done in speech recognition using the 
neural network computing paradigm. This required extensive 
application of digital signal processing. 

Subsequently, I have studied issues concerning realization of neural 
networks using LC. technology for my PhD thesis. I have looked at 
design issues in using analog circuits and have designed a useful 4- 
quadrant analog multiplier which is compact and can be used for diverse 
applications like signal processing and neural networks. 



Industrial: 

.At STMicroelectronics (Since 1996 till date, descending chronologically) 



Currently, I am the Design Unit Manager and Technical Mentor for the 
Advanced Systems Technology group of ST-Noida, which is the systems 



research wing of the company. My group is engaged in advanced processor 
architecture and application development for various multimedia platforms 
of STMicroelectronics and ST-Ericsson. I have been with AST since 2001. 

1) I lead teams of Advanced Systems Technologies (AST) group 
which are working on the design of digital audio and video 
encoders/decoders on advanced processor architectures (VLIW 
and SIMD). 

2) The AST group led by me also works in the area of digital 
multimedia algorithm development on parallel (MIMD) 
architectures. We are currently engaged in porting and developing 
the Google Android stack on the ARM SMP based MontBlanc 
platform from ST-Ericsson, over the Linux-SMP operating 
system. The group is also engaged in development of Power 
Management strategies for the multi-core MontBlanc platform 
over the Linux-SMP operating system. These projects are 
executed for our clients ST-Ericsson. 

3) I lead the development of advanced video IPs and algorithms. My 
group is currently engaged in the development of the Context 
Adaptive Binary Arithmetic Coder (CAB AC) hardware IP for the 
Fairbanks platform from ST-Ericsson, using High Level 
Synthesis. We are also developing a decoder architecture for the 
next generation video compression standard called Scalable Video 
Coding (SVC) for the Home Entertainment Division (HED) of 
STMicroelectronics. 

4) I have led the AST effort on OpenMax 1 .0 standard development 
with Khronos and the development of the OpenMax 1 .0 reference 
source code (Bellaggio). The next evolution i.e. OpenMax 2.0 is 
also being developed in AST Noida under my management. 

Projects done prior to AST assignment (prior to 2001) 

5) Implementation of a V.90(56Kbps) and V.34(33. 6Kbps) 
compliant wireline modem on a Pentium P2 platform under 
WinNT & Win2000. The challenge of the project was in matching 
the stringent real-time requirements of modem operation with the 
non-real time performance of the Windows Operating System. I 
was the leader of this project and the project team had 7 members. 

6) Implementation of a HDTV receiver for DVB-terestrial television 
using the COFDM standard. I was involved in the design and 
implementation of the timing and frequency synchronization 
algorithm, as per CCETT recommendations, on a proprietary DSP 



core. 



7) Design and implementation of a DVD drive front end. I designed 
the specifications for the real time kernel, command parser and the 
servo control module. These three components formed the 
embedded system for the reception and interpretation of 
commands from the DVD drive back end and the servo actions 
corresponding to the commands. The kernel and the command 
parser were implemented on a ST 10- 166 microcontroller. 
Subsequently, I did the design and implementation of the servo 
compensator for the optical tracking and focus mechanism for the 
aforementioned DVD drive on a proprietary DSP core. 

8) I worked for the Memory Products Group of STMicroelectronics- 
India (SGS-Thomsom Microelectronics) from Feb.l996-Jan 1997. 
I have had exposure to the design of asynchronous SRAMs and 
have been actively involved in the design of 2 fast SRAM parts. 

9) I also contributed to a project dealing with Capacitive Discharge 
Ignition Control for combustion efficient, clean exhaust 2 wheeler 
vehicles. My role was to design the ignition control algorithm, 
which had very stringent accuracy requirements in terms of angle 
and duration of firing control. The patent application for this 
algorithm has been filed. 

10) I was the leader of a project for the design of an intelligent battery 
charger for mobile phones. This was meant to demonstrate the 
capabilities of a new battery charging device from 
STMicroelectronics. I personally designed an inflection point 
detection algorithm for this project. 

11) I have been the technical leader for the Smart Card activity of ST- 
Noida in India and have contributed actively in the Smart Card 
Federation of India, which is a standardization body for smart card 
applications, under the umbrella of the Ministry of Information 
Technology, Government of India 

Academic Activities: 

1) I have been holding an adjunct faculty position in Dept. of Electrical 
Engineering, IIT, Delhi, since 2004-05, where I teach two M.Tech/Pre-PhD. 
level courses on VLSI design. The course titles are 

i. Issues and Challenges in Deep Sub Micron VLSI Design. 

ii. Design and Testing of Semiconductor Memories 

2) I have been instrumental in setting up the ST-IITD research collaboration 
between STMicroelectronics and Indian Institute of Technology, Delhi, India, 



which is the first of its kind in the region. 
3) I have been actively involved in the academic activities of Deptt of 

Electronics, Delhi College of Engineering (now DTU), through delivering 
tutorials, examination of M.E. students, curriculum formation meetings and 
advice to faculty. 

KNOWLEDGE LEVEL: 

(a) During M.Tech., I have done courses in digital design and signal 
processing, having obtained good grades and have done practical work 
in the areas. I have done well in courses on MOS L.S.I, design and 
V.L.S.I. CAD and have done useful software development in both. 

(b) I have held teaching assignments at the Indian Institute of Technology, 
Delhi, India, where I teach advanced (Master's level) courses on VLSI 
design. 

(c) I am very well conversant with SPICE and have been able to carry out 
modifications in the package obtain better numerical performance. 

(d) As I have done Ph.D. research work in analog VLSI design (please 
refer to the annexure), I consider myself especially strong in the field. 

(e) I have had good exposure to the design and testing of SRAMs. 

(f) Strong background in DSP, especially in adaptive signal processing 
algorithms and analysis of speech and image signals. 

(g) I have good practical and theoretical experience in designing systems 
with microcontrollers, from the h/w as well as s/w points of view. 

(h) I have a strong software background, especially in C and C++ and in 
the theory and practice of operating systems. 

(i) I have a good background in Mathematics, especially Linear Algebra, 
Numerical Analysis and Functional Analysis. 

(j) I have a good background in parallel computing, specifically for the 

area of embedded applications, holding patents in the area. 



PROJECTS UNDERTAKEN: 

At STMicroelectronics (Since 1996 till date, descending chronologically) 

(1) Development of Context Adaptive Binary Arithmetic Coding hardware IP 
using High Level Synthesis for ST-Ericsson Fairbanks platform 

(2) Development, porting and profiling of Google Android software stack on ST- 
Ericsson MontBlanc platform over Linux-SMP OS 

(3) Research and development of Power Management algorithms for ST-Ericsson 
MontBlanc platform over Linux SMP OS Power Management infrastructure 

(4) Participation on OpenMax 1.0 standard development and development of 
OpenMax reference software (Bellaggio) 



(5) Research and development of decoder architecture for the Scalable Video 
Coding (SVC) standard for Home Entertainment Division (HED) of 
STMircoelectronics. 

(6) Development of parallel algorithms for MPEG audio/video decoding on Linux 
using MPI and shared memory programming (SMP) paradigms. 

(7) Design and implementation of VDSL modem (50Mbps) on a VLIW 
architecture 

(8) Design and implementation MPEG2AAC audio decoder on a VLIW 
architecture 

(9) Design and implementation Dolby AC3 audio codec on a VLIW achitecture 

(10) Design and implementation of a Softmodem with modulation protocols 
uptoV.90. 

(11) Design and implementation of the timing and frequency synchronisation 
algorithms for a HDTV receiver based on the COFDM DVB-Terrestrial 
standard. 

(12) Design and implementation of the servo compensators for the tracking 
and focus mechanisms of a DVD drive. 

(13) Design and specification of a DVD drive front end mechanism as a real 
time embedded system. 

(14) Design of an intelligent battery charger for mobile phones. 

(15) Capacitive Discharge Ignition Control Algorithm. 

(16) Design of a fast (12ns read-write access) 128Kx8 asynchronous SRAM. 

The following projects were done as a part of my Masters and Doctoral studies (1988- 
1995): 

(17) Development of design strategies for analog neural networks. (Doctoral 
Thesis) 

(18) Development of a partitioning method for reducing the connectivity of 
highly connected neural networks for easy LC. realization. (Doctoral Thesis) 

(19) Phoneme recognition from continuous speech using neural network 
(Masters Thesis). 

(20) Speech compression to low bit-rates under the celp32 standard. (Project 
Guidance) 

(21) Development of an adaptive strategy for lowering bit rates in MPEG 
compression. (Project Guidance) 

(22) Speech compression to low bit-rates under the LDCELP and CELP32 
standard (project Guidance). 

(23) Modification of SPICE(3b) for improving solution quality in the case of 
ill-conditioned circuit matrices i.e. for MOS devices in weak 
inversion. (Doctoral Study) 

(24) Development of an RTL using YACC and LEX under UNIX on HP- 
9000s300 machine. (Lab Project) 

(25) Development of a package for module placement using simulated 
annealing method (Lab Project). 



Patents 



1 . " METHOD AND APPARATUS FOR LINEARLY 
SCALABLE FFT/IFFT" — filed 

2. "METHOD AND APPARATUS FOR LINEARLY 
SCALABLE FIR FILTER WITHOUT DELAY LINE"— 
Granted 

3 . "METHOD FOR MULTIPROCESSOR FFT/IFFT WITH 
MINIMUM INTER-PROCESSOR DATA 
COMMUNICATION OVERHEAD"— Filed 

4. "MEANS FOR ENHANCING PERFORMANCE OF 
FFT/IFFT UTILISING MEMORY LOAD/STORE UNITS 
WITH AUGMENTED DATA BUS INSTEAD OF 
MULTIPLE MEMORY LOAD/STORE UNITS "—filed 

5. " PARALLEL IMPLEMENTATION OF MPEG-2 VIDEO 
DECODER IN DISTRIBUTED MEMORY SYSTEM" - 
filed 

6. LOW COMPLEXITY & LOW COST SOLUTION TO 
CLOCK RECOVERY PROBLEM IN THE PRESENCE OF 
LARGE CLOCK JITTER IN DVB-TERRESTRIAL - 
Granted 

7. A SYSTEM AND METHOD FOR A FAST CLOCK 
RECOVERY IN DIGITAL VIDEO COMMUNICATION - 
Filed 

8. CHANNEL SELECTION - filed 

9. A METHOD FOR ESTIMATING NUMBER OF BITS TO 
ENCODE A MACROBLOCK AND A BIT ESTIMATOR - 
Filed 

10. AN INNOVATIVE APPROACH FOR ADAPTATION OF 
GENERIC RATE CONTROL ALGORITHMS FOR 
TARGET VIDEO STANDARDS - Filed 

Parallel Implementation of MPEG-2 Video Decoder, Sarkar,A., 
Maiti, S, Saha, K. - IS&T/SPIE's Symposium on Electronic 
Imaging: Science & Technology, San Jose, Ca, USA, Jan 2006. 
Current and future trends in embedded VLIW microprocessors 
applied to multimedia and signal processing, Desoli G, Strudel 
T, Cousin J-P., Saha K., Invited Paper, 14th European Signal 
Processing Conference, Florence, Sept 2006. 
Parallel Implementation of Dolby- AC3 Audio Decoder on 
Multi-Processor SMP platform, Soni Y.K., Gupta A., Saha K., 
ST Journal of Systems Research, vol 1,2007 
Scalable Integer based Frequency Error Estimation Technique 
for Clock Recovery in Packet Switched Networks, Mathur M. 
and K., Internet and Multimedia Systems and Applications 
(IMSA 2007) , Aug 2007, Honolulu, Hawaii, USA 



Publications 

1) 

2) 

3) 
4) 



5) An innovative approach for adaptation of generic rate control 
algorithms for target video standards, Agarwal M, Johar S., 
Saha K., Piccinelli E., IEEE International Conference on 
Signal Processing and Communications (ICSPC 2007), Nov 
2007, Dubai, UAE 

6) Pre-coded video transcoding into H.264 with arbitrary 
resolution change, Tripathi S, Saha K., Piccinelli E., IEEE 
International Conference on Signal Processing and 
Communications (ICSPC 2007), Nov 2007, Dubai, UAE 

7) Real-Time SVC Decoder in Embedded System, Maiti S.N., 
Gupta A, Saha K., Piccinelli, E.M., International Conference 
on Signal Processing and Multimedia Applications, SIGMAP 
2009, July 2009, Milan, Italy 

8) AVC to SVC Decoding: Overhead Analysis in Real Time 
Embedded System, Maiti S.N., Gupta A, Saha K., Piccinelli, 
EM., International Conference on Signal Processing and 
Multimedia Applications, SIGMAP 2010, July 2010, Athens, 
Greece, (Paper communicated) 

PhD Theses supervised. 

1) Studies in architectures and applications of low-power ad-hoc 
wireless networks - Kumar M, PhD Program in Electronics and 
Communination, BITS Pilani. 

(Thesis in progress) 

2) Architecture and Design of >10Gbps I/Os - Monga S., PhD 
Program in Electrical Engineerind, IIT-Delhi. 

(Thesis in progress) 



M.Tech / MS Theses supervised: 

1) H.263 Encoder for Videoconferencing: Raizada R., MS in 
Software Systems, BITS Pilani, 2003 

2) H.263 Decoder for Videoconferencing: Narula S., MS in 
Software Systems, BITS Pilani, 2003 

3) Digital Set Top Box for Indian Market Requirements: Sharma 
M.K., MS in Software Systems, BITS Pilani, 2004 

SOFTWARE KNOWLEDGE: 

(a) Well versed in C++, Pascal, Fortran, Lisp and the theory and practice 
of operating systems. 

(b) Proficient in DSP algorithms and programming. Experience with 



various brands and versions of DSPs 

(c) Methodologies for architecture space exploration for advanced CPU 
design. 

(d) Linux and MS Windows device driver development 

(e) Good background in computer simulation and simulator development. 

Foreign langauges spoken : French (Fluent), Italian (Fluent),German (Intermediate) 
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Contact Address Permanent address 

22B, Express View Apartment, MIG, R-5/1 , Duk Bungalow Road, Saratpalli, 

Sector-93, Noida-201304, India. Midnapore (W), West Bengal-721 101, India 

E-mail: sriiibmaiti@gmail.com . Telephone: 00-91 -98 10403772(Mobile). 



Objective: To excel in embedded technologies through R&D activities in multimedia and 
parallel processing. 



Total Industrial Experience : ~ 13 years. D.O.B.: 24.10.1975 Sex: M 



Present Status: Working with STMicroelectronics Pvt. Ltd., Inida, 

in Advanced System Technology group. 



Experience Summary 


Major Achievements 


~ R&D in parallel processing, multimedia 
codecs (MPEG2, H.264, SVC, MVC) 
architecture design and implementation. 

- Experienced in working with ST200 
(Family of VLIW processor), optimization 
(up to assembly level), Architecture 
Evaluation, Feasibility and porting of 
Multimedia Applications and DSP 
Algorithms. 

- Design and development experience in 
DLC (Digital Loop Carrier) system (PSTN). 

- Implementation knowledge about CMM, 
ISO. 

- Work as patent committee member to 
evaluate patent proposals 

- ISO/ TS 16949 auditor. 


- 6 patents on FFT, FIR, MPEG2, SVC 

video decoding. 

- Paper publications in International 
conferences. 



Educational Qualifications 



-Masters (M.E. in Electronics and Communications Engineering, DCE, 2004, 80%). 
-Graduation (B.Tech in Electronics and Communications Engineering, 
R.E.C.Kurukshetra, 1997, 74.5%). 



Employment History 



Sr. no. 


Establishment 


Duration 


Designation 


1. 


STMicroelectronics Pvt. 
Ltd. 


Sept'2000 to till date 


Technical Specialist 


2. 


HFCL( R&D) 


Feb'99 to August 2000 


Deputy manager, 
Assistant manager. 


3. 


Indian Telephone Industries( 
ITI) Ltd. 


Oct'97 to Jan'99 


Asst. Executive 
Engineer. 



Project Experience 



1. Multi View Coding (MVC) architecture design/product realization 

• Client: STMicroelectronics internal 

• Description: ST is in development of SoC which is capable of stereo MVC 
decoding for HD resolution. ST is one of the leading hardware IP (intellectual 
property) providers for H.264 decoding/encoding. This project is part of its 
future product development. 

• Responsibility: Understand MPEG standard for MVC and find out the 
additional functionalities as compared to H.264 decoding. Analyze whether 
the existing architecture of H.264 decoding is capable of SVC decoding. 
Then, design architecture and implement MVC decoder in SoC. 

2. Scalable Video Coding(SVC) architecture design/product realization 

• Client: STMicroelectronics internal 

• Description: ST is in development of SoC which is capable of SVC decoding 
for HD resolution. This project is also a part of its future product 
development. 

• Responsibility: Understand and profile the SVC decoder starting from 
standard source code. Find out the additional functionalities as compared to 
H.264 decoding. Analyze whether the existing architecture of H.264 decoding 
is capable of SVC decoding. Then, design architecture and implement SVC 
decoder in SoC. 

• Achievements: 

-Patent filed on SVC video decoder, titled "A Method and System for 
Parallel SVC decoder". 

-Paper accepted in SIGMAP2009. 

3. Parallelisation of H.264 Video decoder 

• Client STMicroelectronics internal 

• Description: Job is to find out a method to parallelize H.264 and implement it 
for multiprocessor platforms, especially on shared memory systems. 

• Responsibilities: To work as a team leader as well as team member in design 
and development. 

4. Parallelisation of MPEG-2 Video decoder 

• Client STMicroelectronics internal 

• Description: Task was to find out a method to parallelize the advanced video 
codecs and implement in multiprocessor platforms both in distributed memory 
and shared memory systems. Subsequently, integration of parallel (Pthread) 



MPEG2 video decoder to a real time DVD player (MPlayer-0.92) by porting 
of this DVD player s/w on a SMP platform. 

• Responsibilities: To work as a tern leader as well as team member in design 
development and simulation/testing. 

• Achievements: 

-Patent filed on MPEG2 video decoder, titled "A MACRO-BLOCK 
LEVEL PARALLEL IMPLEMENTATION OF VIDEO 
DECODER". 

-Paper accepted in IS&T/SPIE's Electronic Imaging 2006. 

5. Parallel algorithm development for DSP-FE functions of VDSL modem 

• Client: STMicroelectronics internal 

• Description: It is visualized that the ST200 family of Very large Instruction 
Word (VLIW) processors, jointly developed by STMicroelectronics and 
HP Corporation, may be used as a replacement for conventional DSP 
processors for application in the area of Very High-Speed Digital Subscriber 
Line (VDSL) modems. STMicroelectronics had a chipset for the VDSL 
application, based on the Zipper-DMT standard. This chipset consists of the 
Analog front-end, DSP front-end (DSP-FE) and ATM processor. It was 
important that a programmable processor based alternative solution be 
explored, to deal with programmability and upgradeability issues. The 
motivation behind the work was to explore the possibility of implementing all 
or some of the DSP-FE functions on the Lx VLIW processor. 

• Responsibilities: To develop algorithm and simulate all the above 
computational processes for multi-processor systems like Lx family of VLIW 
processor. 

• Major Achievements 

-Patent filed on FIR (Finite Impulse Response) filter, titled "Method 
and Apparatus for Linearly Scalable FIR Filter without Delay Line". 

-Patent filed on FFT/IFFT, titled "Method and Apparatus for Linear 
Scalable FFT/IFFT". 

6. Development of parallel algorithm for FFT/IFFT in Multi-processor 
systems with distributed memory architecture 

• Client: STMicroelectronics internal 

• Description: FFT/IFFT algorithm is developed in such a way that the data 
communication overhead between processors are minimum and the overall 
execution time for a given order of FFT/IFFT is reduced in the order of 
number of processors. So a cluster of 4 PCs (with Linux as operating 
system) is set up to simulate the algorithm with LAM-MPI. 



• Major Achievements 



-Patent filed, titled "Method for Multi-processor FFT/IFFT with 
Minimum Inter-processor Data Communication Overhead". 

-Patent filed on FFT/IFFT, titled "Means for Enhancing 
Performance of FFT/IFFT Utilizing Memory Load/Store Units with 
Augmented Data bus Instead of Multiple Memory Load/Store Units ". 

Architecture Design and Development of subscriber line interface cards for 
Optical Access Networks (DLC). 

Client: HFCL (Himachal Futuristic Communications Ltd.) internal. 
Description: Various types of subscriber lines viz. POTS, E&M, 2- Wire loop, 
ISDN-BRI, ISDN-PRI, 64Kbps co-directional data, Trunk line, El leased, 
HDSL, ATM, ADSL are terminated in one line card in Remote Terminal( 
RT). There are STM-1 ring running among the RTs. The bearer data is 
transported to the Central office (CO). The signaling date is processed and 
sent to the CO through the V5.2 protocol. 
Line Cards developed: ISDN-BRI, ISDN-PRI, El leased lines. 
Responsibilities: Detailed analysis of system followed by preparation of 
functional requirement specifications for ISDN-BRI card. Design of circuit 
schematics for the above cards, followed by preparation of software 
requirement specifications and design. 



